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Abstract 

The  Sync  Model,  a  parallel  execution  method  for  logic  programming,  is  pro¬ 
posed.  The  Sync  Model  is  a  multiple-solution  datardriven  model  that  realizes  AND- 
parallelism  and  OR-parallelism  in  a  logic  program  assuming  a  message-passing  mul¬ 
tiprocessor  system.  AND  parallelism  is  implemented  by  constructing  a  dynamic 
data  flow  graph  of  the  literals  in  the  clause  body  with  an  ordering  algorithm.  OR 
parallelism  is  achieved  by  adding  special  Synchronization  signals  to  the  stream  of 
partial  solutions  and  synchronizing  the  multiple  streams  with  a  merge  algorithm. 
The  ordering  algorithm  and  the  merge  algorithm  are  described.  The  merge  algo¬ 
rithm  is  proved  to  be  correct  and  therefore,  the  Sync  Model  is  proved  complete,  i.e., 
the  execution  of  a  logic  program  under  the  Sync  Model  generates  all  the  solutions. 


The  research  described  in  this  paper  was  sponsored  by  the  Defense  Advanced  Re¬ 
search  Projects  Adency,  ARPA  Order  No.  3771,  and  monitored  by  the  Office  of 
Naval  Research  under  contract  number  N00014-79-C-0597. 
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1  Introduction 

One  way  to  improve  the  efficiency  in  the  execution  of  a  logic  program  is  to  exploit  the  potential 
parallelism,  namely  AND  parallelism  and  OR  parallelism,  inherent  to  the  program.  In  this  paper, 
a  method  —  called  the  “Sync  Model”  —  is  proposed  for  the  parallel  execution  of  logic  programs  on 
a  message-passing  multiprocessor  system.  The  method  realized  both  AND-parallelism  and  OR- 
parallelism.  OR  parallelism  -  the  parallel  execution  of  all  clauses  that  are  unifiable  with  the  goal 
-  is  easier  to  realize  than  AND  parallelism  because  the  executions  of  OR  clauses  are  independent 
of  each  other.  On  a  message-passing  system,  the  synchronization  of  the  multiple  solutions  gener¬ 
ated  by  different  processes  is  the  major  problem  in  the  implementation  of  OR  parallelism.  AND 
parallelism  the  parallel  execution  of  AND  literals  in  a  clause  body — may  result  in  binding  conflicts 
for  a  variable  shared  by  several  literals. 

Constructing  a  data  flow  graph  is  the  most  common  approach  for  AND  parallelism.  By  allowing 
exactly  one  producer  for  each  shared  variable,  binding  conflicts  can  be  eliminated.  One  problem  in 
the  data  flow  approach  is  that  the  data  flow  graph  is  changed  dynamically  according  to  the  binding 
values  transmitted  within  the  graph.  When  a  variable  is  bound  to  a  partially  instantiated  term 
containing  another  variable,  binding  conflicts  may  occur.  Therefore,  the  data  flow  graph  needs  to  be 
modified  to  enforce  the  “one  producer  per  variable”  rule  to  the  new  variable.  In  most  computation 
models  for  concurrent  logic  programming  languages,  the  data  flow  graph  of  literals  in  a  clause  body 
is  constructed  by  the  programmer  through  variable  annotations.  Alternatively,  the  data  flow  graph 
can  be  constructed  automatically  by  the  system;  either  dynamically  such  as  in  Conery’s  AND/OR 
process  model  [2]  or  statically  such  as  in  Chang  and  DeGroot’s  static  data  dependency  analysis 
[1,3].  In  the  Sync  Model  presented  in  this  paper,  the  data  flow  graph  is  dynamically  constructed 
after  each  unification  and  is  modified  by  adding  “dynamic  links”  when  partially  instantiated  terms 
are  detected  in  a  binding  by  using  a  run-time  type  checking  algorithm  similar  to  [3].  The  algorithm 
is  more  efficient  than  [2]  and  the  graph  constructed  by  the  algorithm  reveals  more  parallelism  than 
[1].  Optional  variable  annotations  from  the  programmer  may  help  constructing  the  data  flow  graph. 

To  implement  both  AND  parallelism  and  OR  parallelism  in  one  model  is  a  difficult  task.  The 
synchronization  of  partial  solution  streams  in  AND  processes  has  never  been  solved  satisfactorily. 
Either  AND  parallelism  is  suppressed  by  connecting  sibling  AND  processes  into  a  linear  chain  [7, 
9]  or  OR  parallelism  is  reduced  by  using  backtracking  [2],  In  the  Sync  Model,  a  synchronization 
mechanism  is  proposed  to  synchronize  the  multiple  partial  solutions  so  that  all  the  solutions  of  a 
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given  problem  will  be  produced  without  explicit  request.  Therefore,  the  Sync  Model  is  a  multiple 
solution  data  driven  model. 

The  language  we  choose  for  the  Sync  Model  is  an  extended  logic  programming  language,  called 
CLP,  with  optional  variable  annotations  and  a  commit  operator.  Variable  annotations,  the  input 
annotation  (“?”)  and  the  output  annotation  (“!”),  are  used  in  the  clause  body  to  specify  the 
producer  and  the  consumer  of  a  shared  variable.  The  commit  operator  >”  is  used  to  serialize 
the  executions  of  two  parts  of  the  clause  body.  CLP  is  not  designed  as  a  concurrent  language. 
The  variable  annotations  and  the  commit  operator  are  used  to  achieve  more  efficient  execution 
under  the  Sync  Model,  but  they  are  not  required  and  do  not  change  the  semantics  of  the  language. 
Therefore,  although  the  Sync  Model  is  designed  for  CLP,  any  Horn-clause  logic  program  can  be 
executed  under  the  Sync  Model. 

The  target  machine  for  the  Sync  Model  is  a  message-passing  multiprocessor  system  with  the 
processors  interconnected  into  an  augmented  binary  tree,  called  the  Sneptree  [8,5].  Since  the  map¬ 
ping  of  an  unbounded  binary  tree  onto  the  Sneptree  is  done  automatically  and  the  mapping  of  a 
complete  binary  tree  onto  the  Sneptree  is  always  optimal,  the  Sneptree  is  an  ideal  architecture  for 
the  Sync  Model. 

One  of  the  major  distinction  between  the  Sync  Model  and  the  computation  models  for  other 
concurrent  logic  programming  languages,  such  as  Concurrent  Prolog  [13],  is  that  in  our  Model,  a 
process  is  suspended  when  waiting  for  an  input  from  an  input  channel,  while  in  Concurrent  Prolog, 
a  process  is  suspended  when  it  attempts  to  unify  a  read-only  variable  with  a  non-variable  term.  In 
our  approach,  all  the  input  variables  are  bounded  before  the  unification  so  that  the  unification  rule 
is  not  changed.  In  Concurrent  Prolog  and  other  similar  approaches  [11,12],  the  unification  rules 
are  modified  to  handle  variable  annotations.  As  a  consequence,  the  variable  annotations  may  be 
propagated  to  other  non-annotated  variables  and  a  read-only  variable  may  get  instantiated  in  a 
unification. 

The  rest  of  the  paper  is  organized  as  follows:  In  the  next  section,  the  language  and  the  Sync 
Model  are  described.  We  also  address,  and  propose  solutions  to,  the  main  problem  of  constructing 
the  data  flow  graph,  i.e.,  binding  to  a  partially  instantiated  term  causes  the  data  flow  graph  to 
be  changed,  as  well  as  the  synchronization  problem  of  multiple  partial  solutions  in  the  data  flow 
graph.  In  sections  4  and  5,  the  two  main  algorithms  of  the  Sync  Model,  i.e.,  the  ordering  algorithm 
and  the  merge  algorithm,  are  presented.  We  also  prove  the  correctness  and  completeness  of  the 
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merge  algorithm  and  the  Sync  Model. 

2  The  Language  and  the  Sync  Model 

2.1  The  Language 

The  language,  called  CLP,  (which  stands  for  Concurrent  Logic  Programming),  is  an  extended 
logic  programming  language  with  variable  annotations  and  guarded  clauses. 

A  CLP  program  is  a  finite  set  of  guarded  clauses  of  the  form 

A  Gi,G2,..  .  ,Gm  — >  Bx,  B2, . . . , Bn. 

where  A  is  called  the  head  of  the  clause,  (Gx, . . . ,  Gm)  the  guard  of  the  clause,  and  (Bi, . . . ,  Bn) 
the  body. 

The  guard  of  a  clause  may  be  empty.  When  the  guard  is  empty,  the  commit  operator  is 
neglected.  When  both  the  guard  and  the  body  are  empty,  the  clause  is  called  a  unit  clause.  Both 
the  guard  and  the  body  are  a  set  of  literals.  The  two  sets  are  separated  by  a  commit  operator,  “ — >” . 
Declaratively,  the  commit  operator  reads  like  a  conjunction:  A  is  true  if  Gx, . . . ,  and  Gm,  as  well 
as  Bx, . . . ,  and  Bn  are  true.  Operationally,  the  commit  operator  forces  the  sequential  execution  of 
the  guard  and  the  body:  a  goal  A1  which  is  unifiable  with  A  can  be  reduced  to  Bx, . . . ,  and  Bn  if 
and  only  if  the  guard  literals  Gx  . . .  Gm  are  evaluated  to  true. 

A  variable  can  be  either  a  simple  variable,  or  an  output  variable  annotated  by  a  postfix  operator 
“!” ,  or  an  input  variable  annotated  by  a  postfix  operator  “?” .  Variable  annotations  are  not  allowed  in 
the  clause  head.  This  restriction  prohibits  annotated  variables  from  appearing  in  the  unification. 
Therefore,  Robinson’s  unification  algorithm  can  be  used  directly  without  any  modification.  A 
variable  is  “shared”  when  it  appears  in  more  than  one  literal  in  the  body.  For  a  shared  variable 
in  the  body,  at  most  one  literal  containing  that  variable  is  allowed  to  have  it  annotated  as  output. 
Such  a  literal  is  called  the  producer  of  that  variable,  and  the  literals  that  contain  input  variables 
are  called  the  consumers  of  those  variables.  The  guard  may  not  have  any  shared  variable  with  the 
clause  head  or  the  body  after  unification  —  a  guard  evaluates  to  true  or  false  without  generating 
any  outputs.  But  share  variables  between  guard  literals  are  allowed.  Such  a  syntactic  restriction 
separates  the  guard  and  the  body  into  two  independent  parts  which  simplifies  the  implementation 
of  our  Model.  In  each  CLP  program,  there  is  a  goal  with  the  form  G” . 
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Unlike  other  parallel  logic  programming  languages,  the  extra  language  constructs  in  CLP,  the 
variable  annotations  and  the  commit  operator,  do  not  affect  the  semantics  of  the  language.  They 
can  be  used  by  the  programmer  optionally  to  achieve  more  efficient  execution  under  the  Sync 
Model.  In  order  to  prevent  the  semantics  from  being  changed  by  the  commit  operator,  when  the 
restriction  on  the  variables  of  the  guard  is  violated,  the  system  simply  ignores  the  commit  operator 
and  executes  the  guard  and  the  body  in  parallel. 

The  execution  of  a  logic  program  is  to  construct  and  search  the  AND/OR  tree  of  this  program. 
For  a  given  goal  and  a  program,  there  exists  a  unique  AND/OR  tree  which  represents  the  complete 
search  space  of  the  goal.  The  Sync  Model  constructs  a  tree  of  processes  corresponding  to  the 
AND/OR  tree  of  the  program  and  search  the  tree  in  breadth-first  manner. 

2.2  The  Sync  Model 

The  computation  model  of  CLP,  called  the  Sync  Model,  is  a  process  model.  Two  types  of 
processes  are  created  and  terminated  dynamically  during  the  computation.  An  AND  process  corre¬ 
sponds  to  a  goal,  and  an  OR  process  corresponds  to  a  clause  that  is  used  to  reduce  a  specific  goal.  A 
tree  of  interleaved  AND  and  OR  processes,  called  the  process  tree ,  is  constructed  corresponding  to 
the  AND/OR  tree  of  the  program.  The  initial  goal  is  assigned  to  an  AND  process,  which  becomes 
the  root  of  the  process  tree.  For  each  clause  whose  head  is  unifiable  with  the  goal  of  an  AND  pro¬ 
cess,  one  OR  process  is  spawned  to  carry  out  the  unification  and  the  reduction  of  this  OR  clause. 
After  unification  succeeds  in  an  OR  process,  the  reduction  of  the  goal  is  carried  out  by  spawning 
one  AND  process  for  each  literal  in  the  body  and  then  reducing  the  goals  in  the  AND  processes 
concurrently.  If  the  clause  in  an  OR  process  has  a  nonempty  guard,  a  set  of  AND  processes  is 
spawned  for  each  goal  in  the  guard  first.  When  all  the  AND  processes  for  the  guard  successfully 
terminate,  the  OR  process  can  spawn  processes  for  the  goals  in  the  body  and  proceed.  When  any 
of  the  guard  literals  fails,  the  OR  process  fails.  Therefore,  full  OR  Parallelism  is  implemented  in 
this  model  in  the  way  of  parallel  unification  of  all  the  unifiable  clauses,  parallel  evaluation  of  all 
the  guard  literals  and  parallel  execution  of  all  the  OR  branches  that  succeed  in  unification. 

A  leaf  node  of  the  process  tree  is  either  an  OR  process  which  fails  to  unify,  or  an  OR  process 
corresponding  to  a  unit  clause,  or  an  AND  process  corresponding  to  a  built-in  predicate.  An 
OR  process  containing  a  unit  clause  returns  the  variable  bindings  to  its  father  AND  process  and 
terminates  if  it  succeeds  in  unification.  An  AND  process  corresponding  to  a  built-in  predicate 
evaluates  the  predicate  directly  and  sends  the  variable  bindings  to  proper  destination  processes. 
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A  non-leaf  AND  process  succeeds  when  at  least  one  of  its  OR  descendants  succeeds.  It  receives 
the  bindings  of  its  output  variables  from  the  descendants  and  sends  them  out  to  its  father  and 
all  the  sibling  consumer  processes  of  its  output  variables.  A  non-leaf  OR  process  succeeds  when 
all  its  descendant  AND  processes  successfully  terminate.  It  merges  the  results  received  from  its 
descendants  and  then  sends  them  to  its  father. 

AND  parallelism  is  implemented  by  dynamically  constructing  the  data  flow  graph  of  the  literals 
in  the  clause  body.  To  avoid  binding  conflict  in  the  parallel  execution  of  sibling  AND  processes  with 
shared  variables,  only  one  AND  process  is  allowed  to  be  the  producer  of  a  shared  variable.  All  the 
other  AND  processes  that  also  contain  that  shared  variable  are  considered  the  consumers  of  that 
variable.  A  consumer  process  will  suspend  its  computation  until  the  values  of  its  input  variables 
have  been  received  from  their  producers.  A  data  flow  graph  of  all  the  literals  in  the  clause  body, 
(so-called  AND  literals),  is  constructed  such  that  a  node  represents  an  AND  literal  and  an  edge  is 
directed  from  the  producer  of  a  shared  variable  to  a  consumer  of  that  variable.  As  we  shall  see, 
the  ordering  algorithm  will  guarantee  that  the  data  flow  graph  is  acyclic  so  as  to  avoid  deadlock. 
Communication  channels  are  added  into  the  process  tree  to  model  the  edges  of  the  data  flow  graph. 
With  the  communication  channels  between  sibling  AND  processes,  the  process  tree  is  no  longer  a 
tree.  We  prove  later  that  our  process  tree  generates  the  same  results  as  the  corresponding  AND /OR 
tree. 

The  input  and  output  annotations  in  CLP  are  added  to  the  program  optionally  by  the  pro¬ 
grammer  to  help  construct  the  data  flow  graph  so  that  more  efficient  computation  can  be  achieved. 
Without  explicit  variable  annotations,  the  “left  to  right”  order  of  the  AND  literals  is  used  for  se¬ 
lecting  the  producer  of  a  variable.  The  explicit  variable  annotation  should  fulfill  the  two  restrictions 
on  the  data  flow  graph:  one  producer  per  variable  and  acyclicity  of  the  data  flow  graph.  These  can 
easily  be  checked  syntactically. 

Parallel  execution  of  different  OR  processes  may  produce  multiple  solutions  for  the  output 
variables  of  their  father  AND  process.  Those  multiple  solutions  will  be  transmitted  along  the 
communication  channels.  Hence,  we  need  some  mechanism  to  synchronize  the  multiple  inputs  of  a 
given  AND  process  originating  from  different  sources.  In  our  computation  model,  any  process  that 
generates  or  collects  a  solution  transmits  the  solution  without  requiring  a  request.  Hence,  our  model 
can  be  viewed  as  a  multiple- solution  data-driven  model.  With  this  synchronization  mechanism,  we 
are  able  to  incorporate  both  AND  parallelism  and  OR  parallelism  without  any  form  of  backtracking. 
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2.2.1  Synchronization  of  Multiple  Inputs  of  a  Process 

Multiple  solutions  for  a  variable  may  be  transmitted  through  the  communication  channels  in 
the  data  flow  graph.  If  one  AND  process,  say  p,  consumes  two  inputs  from  two  different  sources, 
we  shall  merge  the  two  input  streams  to  form  all  the  input  combinations  of  process  p.  Usually,  the 
input  combination  of  process  p  is  the  Cartesian  Product  of  the  two  input  streams.  There  is  one 
exception  —  when  the  two  input  streams  originate  in  the  same  process,  the  input  combination  of 
P  is  a  set  of  Cartesian  Products  over  certain  portions  of  the  two  input  streams  that  derive  from 
the  same  output  of  the  common  ancestor.  In  the  sequel,  we  call  a  set  of  paths  that  have  the  same 
starting  process  and  the  same  ending  process  a  multiple  path  between  these  two  processes.  In  Figure 
1,  there  are  two  paths  ( a,b,d )  and  ( a,c,d )  between  process  a  and  process  d.  If  process  a  binds 
(X,Y)  to  and  ( ®2>J/2 )>  process  b  binds  T  to  ti  and  1 2  with  input  x\,  *3  with  input  x2>  and 

process  c  binds  S  to  sj  and  $2  with  input  y\,  S3  and  S4  with  input  1/2,  then  the  input  combination 
for  process  d  should  be  (t\,  si),  (*i,  S2),  (*2>  sl)>  (*2>  s2)>  (*3>  s3)>  (*3>  *4)  instead  of  the  full  Cartesian 
Product  of  the  two  input  streams.  Observe  that  the  first  four  input  pairs  of  process  d  are  derived 
from  the  input  (®i,yi)  and  the  remaining  two  input  pairs  are  derived  from  (a:2, 2/2)-  Because  the 
two  inputs  of  process  d  originate  in  the  same  process  a,  we  shall  form  the  Cartesian  Product  over 
the  portions  of  the  input  streams  which  are  generated  by  the  same  output  pair  of  process  a,  e.g., 
(*1»*2)  and  (si,S2)>  or  (*3)  and  (s3,s4).  In  order  to  derive  the  correct  input  combination,  we  mark 
process  a  as  a  Sync  generator  and  the  outputs  generated  by  process  a  are  separated  by  a  special 
Sync  signal.  The  Sync  signals  are  then  propagated  through  processes  b  and  c,  and  reach  process  d 
in  both  inputs.  Finally  process  d  detects  the  same  Sync  signals  in  both  inputs  and  then  forms  the 
Cartesian  Product  over  the  input  portions  which  are  enclosed  by  the  corresponding  pair  of  Sync 
signals. 

After  the  data  flow  graph  has  been  constructed,  we  determine  all  the  multiple  paths  in  the 
graph  and  mark  the  starting  nodes  of  those  paths  as  Sync  generators.  Different  Sync  generators 
generate  different  Sync  signals.  A  process  that  receives  two  or  more  inputs  from  different  channels 
merges  the  input  streams  according  to  the  Sync  signals  carried  in  each  input  stream.  The  Sync 
signals  may  be  duplicated  during  the  merge  process  when  they  are  nested  in  other  Syncs.  In  the 
above  example,  process  a  is  a  Sync  generator,  hence  the  output  streams  generated  by  process  a 
should  be  (5’a,xi,5a,X2,END)  and  (S'a,y1,5,a,y2,END)  respectively,  where  Sa  represents  a  Sync 
signal  generated  by  process  a  and  “END”  represents  a  special  signal  indicating  the  end  of  the 
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stream.  Likewise,  the  output  streams  of  process  b  and  process  c  should  be  (Sa,ti,t2,  S’®,  £3,  END) 
and  (S0,  s\,S2,  Sa,ss,  S4,  END)  respectively.  Therefore,  the  input  combination  of  process  d  becomes 
(Sa,  (ti,*i),  (£l » s2) j  (*2,«i),  (t2,S2),Sa,  (t3,«3),  (t3,«4),END).  Once  a  Sync  signal  is  generated,  it  is 
propagated  to  (may  be  duplicated  in)  the  other  sibling  AND  processes  through  the  communication 
channels  in  the  data  flow  graph.  The  Sync  signals  will  be  removed  at  the  father  OR  process  before 
the  output  streams  are  sent  out  to  higher  level  AND  processes.  Therefore,  the  Sync  signals  are 
local  to  the  OR  process  and  its  AND  descendants. 


2.2.2  Partially  Instantiated  Terms 

When  the  producer  of  a  variable  binds  the  variable  to  a  partially  instantiated  term,  i.e.,  a 
term  containing  another  variable,  binding  conflict  may  occur  if  that  variable  has  more  than  one 
consumer.  We  solve  this  problem  by  adding  so-called  “dynamic  links”  into  the  graph  to  enforce 
the  “one  producer  per  variable”  rule  to  the  newly  generated  variable. 

The  data  flow  graph  needs  to  be  changed  in  two  cases:  (1)  when  a  variable  is  bound  to  a 
partially  instantiated  term  and  this  variable  has  more  than  one  consumer,  and  (2)  when  two  or 
more  variables  are  bound  to  some  terms  containing  the  same  variable.  In  both  cases,  one  of  the 
consumers  of  these  variables  is  selected  as  the  producer  of  the  new  variable  and  the  dynamic  links 
are  directed  from  the  new  producer  to  all  the  rest  of  the  consumers.  The  information  about  dynamic 
links  is  not  provided  during  the  construction  of  the  data  flow  graph.  Instead,  such  information  is 
generated  and  sent  to  the  selected  producer  of  the  new  variable  when  an  AND  process  binds  some 
output  variables  to  partially  instantiated  terms.  A  simple  test  on  the  binding  values  of  all  the 
output  variables  to  test  the  above  two  cases  is  sufficient  to  determine  whether  dynamic  links  are 
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needed  and  how  they  are  directed.  Such  a  test  is  similar  to  DeGroot’s  type  checking  [3],  except 
that  we  do  the  same  check  in  every  AND  process  without  consulting  the  complex  graph  expression 
proposed  by  DeGroot. 

The  creation  of  dynamic  links  may  introduce  new  Sync  generators.  The  only  process  which 
may  become  a  Sync  generator  is  the  producer  of  the  new  variable,  which  becomes  a  Sync  generator 
when  the  node  that  binds  a  variable  to  the  partially  instantiated  term  is  also  a  Sync  generator. 
Those  Sync  Generators  are  identified  and  marked  after  the  dynamic  links  are  created.  For  more 
detail  about  dynamic  links,  see  [6]. 

2.2.3  The  AND  process  and  the  OR  process 

We  briefly  summarize  the  major  tasks  performed  by  an  AND  or  an  OR  process.  For  full  detail 
of  the  Sync  Model,  see  [6], 

AND  process 

-  Call  a  merge  algorithm  to  merge  the  input  streams  and  bind  the  merged  inputs  to  the  input 
variables  of  the  goal  one  at  a  time  if  the  goal  contains  input  variables. 

-  Perform  type  checking  on  the  merged  input  and  create  dynamic  links  if  necessary. 

-  Spawn  OR  processes  and  collect  the  results  for  each  of  the  goals  with  bound  input  variables. 

-  Generate  Sync  signals  to  separate  each  of  its  outputs  if  it  is  a  Sync  generator. 

OR  process 

-  Unify  the  goal  with  a  given  clause. 

-  Return  the  bindings  derived  in  the  unification  followed  by  an  “END”  to  its  father  if  the  given 
clause  is  a  unit  clause. 

-  Construct  the  data  flow  graph  of  the  guard  literals  and  spawn  AND  processes  for  the  guard  if 
the  given  clause  has  a  nonempty  guard. 

-  Construct  the  data  flow  graph  of  the  literals  in  the  body  and  spawn  AND  processes  for  the 
body  if  all  the  AND  processes  for  the  guard  return  true  or  the  given  clause  has  an  empty  guard. 

-  Merge  the  partial  solutions  received  from  its  descendants,  remove  the  Sync  signals  and  send 
the  results  to  its  father. 
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3  The  Ordering  Algorithm 

By  imposing  that  each  shared  variable  has  exactly  one  producer,  we  eliminate  binding  conflicts. 
To  construct  the  data  flow  graph  of  AND  literals,  an  Ordering  Algorithm  is  applied  in  each  OR 
process.  The  data  flow  graph  is  represented  in  two  ways:  by  variable  annotations  in  the  literals 
and  by  a  channel  table  containing  the  producer  and  consumer  information  of  shared  variables. 

The  Ordering  Algorithm  is  performed  in  an  OR  process  to  construct  the  data  flow  graph  of  the 
AND  literals  after  unification  succeeds  and  the  variables  in  the  clause  body  are  replaced  by  their 
binding  values  if  they  are  instantiated  after  the  unification.  The  Ordering  Algorithm  consists  three 
major  steps:  (l)  the  construction  of  the  data  flow  graph,  (2)  the  refinement  of  the  graph,  and  (3) 
the  marking  of  the  Sync  generators.  In  the  first  step,  variable  annotations  are  used  to  determine  the 
modes  (input  or  output)  of  the  uninstantiated  variables  in  the  AND  literals.  Initially,  all  the  AND 
literals  in  the  clause  body  are  stored  in  an  Undecided  Process  List  (UPL).  The  algorithm  determines 
the  producer  and  the  consumers  of  all  the  variables  in  the  AND  literals,  adds  annotations  to  all  the 
variables,  and  then  moves  the  literals  to  a  Fired  Process  List  (FPL).  A  Channel  Table  (CT)  is  also 
constructed  to  store  the  producer  and  consumers  information  of  all  the  variables.  Moreover,  the 
literals  are  renumbered  during  this  step  so  that  their  numerical  order  is  consistent  with  their  partial 
order  in  the  data  flow  graph.  In  the  second  step,  the  data  flow  graph  is  further  refined  by  creating 
“selective  channels”  and  “True/False  channels”  for  the  literals  that  generate  no  output  variables. 
As  we  shall  see,  this  step  is  necessary  to  exploit  the  parallelism  implied  by  the  program  so  that  a 
more  efficient  data  flow  graph  can  be  constructed.  In  the  third  step,  the  algorithm  searches  for  all 
the  multiple  paths  in  the  data  flow  graph.  If  a  multiple  path  is  found,  the  algorithm  marks  the 
starting  node  of  the  multiple  path  as  a  SYNC  generator.  The  complete  algorithm  will  be  elaborated 
in  the  remainder  of  this  section. 

Data  Structure: 

The  following  data  structures  are  used  in  the  algorithm: 

•  UPL  -  a  list  of  AND  literal  and  identifier  pairs  that  are  not  fired  yetf. 

•  FPL  -  a  list  of  fired  AND  literals  with  all  their  variable  arguments  annotated.  Each  entry 
in  the  list  contains  an  AND  literal  with  annotated  arguments,  a  Sync  attribute,  and  a 
number  attached  to  the  literal  to  enforce  a  total  order. 

f  “A  literal  is  fired”  means  that  a  literal  is  moved  from  UPL  to  FPL  and  all  its  variable  arguments 


are  annotated. 
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•  CT  -  a  table  of  triples  ( Var, Producer  ,Consumers-list),  to  record  the  producer  and  con¬ 
sumers  of  a  variable. 

•  S  -  a  stack  containing  distinct  nodes  belonging  to  the  paths  starting  from  one  specific  node. 

Besides,  the  AND  literals  are  initially  identified  1  to  N  from  left  to  right  in  the  clause  body 
with  the  goal  of  the  current  OR  process  numbered  0. 

Algorithm: 

Step  0:  Initialization 

CT:=  0;  FPL:=  0; 

UPL:=  list  of  all  literals. 

Step  1:  Construction  of  the  data  flow  graph: 

In  this  step,  the  producer  and  the  consumers  of  each  shared  variable  are  chosen  and 
the  variables  in  each  literal  are  annotated. 

A  literal  can  be  fired  iff  (1)  all  its  input  variables  have  a  producer  and  the  producers  are  already 
fired,  and  (2)  the  total  number  of  output  variables,  input  variables,  and  constant  arguments 
of  this  literal  is  at  least  one.  The  first  condition  assures  that  a  producer  of  a  shared  variable 
is  always  fired  before  the  consumers  of  this  variable.  The  second  condition  implies  that  the 
threshold  [14]  of  each  literal  is  one.  If  none  of  the  unfired  literals  satisfies  the  above  conditions, 
the  leftmost  unfired  literal  in  the  clause  body  is  chosen  to  be  fired  next. 

a.  forall  vt-:  v*  e  uninstantiated  variables  in  the  goal 

add  (vj,  [],  [0])  into  CT; 

b.  forall  /:  l  GUPL 

do  forall  Vj  <E  variable  arguments  in  l 

do  if  Vi  CT  — ►  if  Vi  is  output  annotated  — >  add  ( Vi,l ,  [])  into  CT 
|  otherwise  — »  add  (u,-,  [],  [])  into  CT 
fi 

|  Vi  E  CT  — >  if  Vi  is  output  annotated  — >  CT .Vj- .producer  :=  l 
|  otherwise  — ►  skip 


fi 
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c.  index  :=  1; 
while  UPL^  0 

do  fired  :=false; 

foraU  l:  l  GUPL 

do  forall  vf.  ut-  is  unannotated  A  CT.Vj. producer  ^[] 
mark  as  an  input  variable  in  UPL; 
b  :=true; 

forall  v,-:  ut-  is  an  input  variable  in  l 
do  x  :=CT.t ’{.producer-, 

b  :=  b  A  (x  ^  []  A  x  >  N) 

od  {b  =  Vut-  :  Vi  is  an  input  variable  in  1:  V{  has  a  producer  and  the  producer  is  fired} 
if  bA(#constant  arguments+#output  variables+#input  variables>0)— > 

{beginning  of  firing  process} 

newid  index  +  N; 

forall  vf.  Vi  €  variableargumentsinl 

do  if  Vj  is  input  — ►  add  newid  into  CT.Uj. consumer 

| Vi  is  unannotated Vut-  is  output  — *  CT.Uj. producer  :=  newid; 

mark  V{  as  an  output  variable  in  UPL 
|  otherwise  — >■  skip 

fi; 

od 

UPL:=UPL— /; 

FPL[fndea:]  :=  /; 
index  :=  index  +  1; 
fired  :=true 
{ end  of  firing  process} 

[otherwise— ►  skip 
fi 

od  ; 

d.  if  -t fired  — >  /  :=UPL[1]; 

do  “firing  process” 

|  otherwise— >skip 
fi 

od  ; 

e.  forall  vf.  t»j-  €.  CT 

do  if  CT.Uj  .consumer  =  []—>■  CT:=CT— vt- 
|  otherwise  — >■  skip 
fi 

od  . 
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Step  2:  Refinement  of  the  Graph 

If  some  literals  have  no  output  variables  in  the  data  flow  graph  constructed  in  step  1,  extra 
links  need  to  be  added  into  the  graph  to  make  sure  the  true/false  results  of  this  kind  of  literals 
will  be  transmitted  to  the  goal. 

Let’s  assume  p  is  such  a  literal  and  X  is  an  input  variable  of  p.  In  this  step,  we  first  attempt  to 
add  so-called  selective  channels  from  p  to  the  rest  of  the  consumer  literals  of  X.  These  channels 
transmit  only  the  values  of  X  that  make  p  true.  Meanwhile,  the  links  between  the  producer 
of  X  to  the  consumers  of  X  except  p  are  removed  from  the  graph..  If  no  selective  channel  is 
constructed  for  p,  a  True/False  channel  is  added  from  p  to  the  goal  to  transmit  the  results  of 
P- 

The  insertion  of  selective  channels  should  not  cause  cycles  in  the  graph.  To  assure  the  acyclicity 

of  the  graph,  we  only  add  the  selective  channels  such  that  the  receiver  of  the  channel  is  fired 

after  all  the  antecedents  of  the  sender.  The  antecedents  of  a  literal  are  the  producers  of  all  the 

input  variables  of  the  literal. 

forall  /:  /  G  FPL  A  l  has  no  output  variables 
do  new  :=  false ; 
prod  :=  0; 

forall  vf.  Vi  is  an  input  variable  of  l 
add  CT.^i  .producer  into  prod ; 
forall  ut-:  ut-  is  an  input  variable  of  l 
do  c  :=  CT.Vi.consumer-, 

cl  :=  {c^c;  G  c  :  (Vpy  €  prod  :  ct-  >  pj )  A  ct-  ^  /}; 
if/GcAcI^0— >  add  l, cl)  into  CT; 

CT .Vi.consumer  :=  c  —  cl] 
new  :=true 
|  otherwise  —*■  skip 
if 

od  ; 

if  -mew  — *  add  { t/ f,l ,  [0])  into  CT 
|  otherwise  — ►  skip 
fi 

od  . 

Step  3:  Marking  of  the  Sync  generators  (Detection  of  the  multiple  paths): 

A  stack  is  built  for  each  literal  l  in  FPL  that  has  more  than  one  output  channel.  The  de¬ 
scendants  of  l  are  pushed  into  the  stack  if  they  are  not  yet  in  the  stack.  This  pushing  process 


13 


continues  until  either  all  the  descendants  of  l  are  in  the  stack  or  a  descendant  to  be  added 
to  the  stack  is  found  to  be  already  in  the  stack.  In  the  second  case,  l  is  marked  as  a  SYNC 
generator. 

forall  l  :  l  gFPL  A  #consumers(/)  >  1 
do  pt  :=  1;  S  :=  [/]; 
while  S  [pt]  ^  0 
do  p  :=  £[pt]; 

forall  Vi  :  vt-  is  a  variable  in  p 
do  if  CT.Ci.producer  =  p  — > 

forall  Cj  :  c*  £  CT.V{. consumer 
do  if  Cj  g  S  — ►  push  ct-  into  S 

|ct-  £  S  — *■  set  Sync  attribute  of  p  to  true  in  FPL;  stop 
fi 
od 

|otherwise  — >  skip 

fi; 

od 

pt  :=  pt  +  1 
od 
od 

The  above  algorithm  always  generates  an  acyclic  data  flow  graph  with  one  producer  per  shared 

variable.  The  ordering  algorithm  is  correct  for  the  following  reasons: 

1.  The  ordering  algorithm  selects  exactly  one  producer  for  each  variable. 

2.  The  data  flow  graph  generated  in  Step  1  is  acyclic  because  a  literal  can  be  fired  only  when 
all  the  producers  of  its  input  variables  have  been  fired  (6  is  true  in  Step  l.c).  Therefore,  the 
producer  of  a  given  variable  is  always  fired  before  all  the  consumers  of  that  variable.  The  firing 
order  of  the  literals  implies  their  partial  order  in  the  data  flow  graph,  thus,  the  graph  has  no 
cycles. 

3.  The  refined  data  flow  graph  generated  in  Step  2  is  acyclic  because  the  redirected  links  do  not 

create  cycles  in  the  refined  graph.  If  a  cycle  were  found  in  the  refined  graph,  it  would  contain 
at  least  one  redirected  link,  say  (1,-,/y).  Let  the  cycle  be  . . .  ,/*,!;),  then  lk  is  the  producer 

of  one  input  variable  of  /t-  and  lj  >  lk  because  a  path  exists  from  lj  to  lk.  In  Step  2,  such  a 
link  is  never  generated  because  lj  is  excluded  from  cl.  Therefore,  the  refined  graph  is 

also  acyclic. 


An  Example 
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Figure  2  is  a  query:  “Is  there  a  student  such  that  a  professor  teaches  him  two  different 
courses  in  the  same  room?”  for  a  data  base  of  Students  who  take  Courses  (student  (S,C)), 
Professors  who  teach  Courses  (prof  essor(P,  C)),  and  Courses  held  on  certain  weekdays  and  .Rooms 
(course(C,  D,  R)),  [10].  To  save  space,  the  database  of  relations  student,  course ,  and  professor  are 
omitted  here. 


query(S.P):-  student (S, Cl) ,  (1) 

course (Cl, Dl, R) ,  (2) 

professor(P.Cl) ,  (3) 

student (S , C2) ,  (4) 

C1^C2,  (5) 

course (C2 , D2 , R) ,  (6) 

prof essor (P ,C2) .  (7) 


Figure  2.  A  Query  for  a  database  of  students 

To  answer  the  query  query (S,  P).” ,  we  construct  a  process  tree  and  map  the  initial  goal 
to  the  root.  In  the  OR  process  that  is  spawned  by  the  root,  we  shall  apply  the  ordering  algorithm 
against  the  seven  AND  literals  in  Figure  2. 

Since  none  of  the  variables  are  annotated  in  the  definition  of  query,  we  select  the  producers 
of  the  shared  variables  by  imposing  the  left-to-right  order  of  the  literals  and  as  a  consequence,  the 
data  flow  graph  constructed  by  Step  1  is  shown  in  Figure  3. 


Figure  8.  The  Data  flow  graph  of  query(S,  P ) 

In  Step  1,  the  literals  are  renumbered  so  that  their  numerical  order  implies  their  partial  ordering 
in  the  graph.  The  new  identifiers  of  the  literals  are  enclosed  in  the  parentheses  next  to  each  node 
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in  Figure  3.  Notice  that  literals  (5),  (6),  and  (7)  don’t  generate  any  outputs.  After  adding  selective 
channels  to  these  literals  by  Step  2,  the  refined  data  flow  graph  is  shown  in  Figure  4.  Comparing 
with  the  previous  graph,  we  found  that  the  variable  C2  is  transmitted  sequentially  from  literal  (4) 
to  (5),  (6)  and  (7)  in  the  refined  graph  instead  of  transmitted  in  parallel  in  the  original  graph.  At 
first  glance,  the  refined  graph  seems  to  have  less  parallelism  than  the  original  one.  In  fact,  the 
latter  one  is  more  efficient  than  the  former  one  because  literal  (6)  or  (7)  only  receives  the  values 
of  C2  such  that  literal  (5)  or  (6)  is  proved  true.  Therefore,  the  values  of  C2  generated  by  (4)  will 
first  be  filtered  by  (5),  then  sent  to  (6)  and  so  forth.  Unnecessary  computations  are  avoided  in 
(6)  and  (7)  because  invalid  values  of  C2  won’t  be  received  by  them.  Also  notice  that  no  selective 
channels  are  constructed  for  Cl  at  literal  (5)  because  the  consumers  of  Cl,  (2)  and  (3),  are  both 
fired  before  the  producer  of  C2,  i.e.,  (4).  To  assure  the  acyclicity  of  the  graph,  the  channels  for  Cl 
remain  unchanged. 

In  Step  3,  a  stack  is  built  up  for  literal  (1).  A  multiple  path  is  found  when  (5)  is  going  to  be 
pushed  into  the  stack  twice.  Therefore,  (1)  is  marked  as  a  Sync  generator.  No  more  stack  is  needed 
because  all  the  other  literals  have  exactly  one  descendant  each. 


The  average  complexity  of  the  ordering  algorithm  is  O(nlgn)  with  n  AND  literals.  In  most 
cases,  the  AND  literals  in  a  clause  body  are  almost-ordered,  therefore,  a  linear  complexity  can  be 
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achieved.  For  detail  analysis  of  the  complexity  of  the  ordering  algorithm,  please  see  [6], 

4  The  Merge  Algorithm 

In  the  Sync  Model,  a  process  has  to  handle  multiple  input  streams  from  different  sources. 
For  example,  an  OR  process  has  to  merge  all  the  partial  solutions  from  its  AND  descendants  to 
form  the  solutions  of  this  OR  process,  and  an  AND  process  needs  to  merge  the  input  streams 
from  other  sibling  AND  processes  to  form  input  combinations  to  itself.  It  is  particularly  true  for 
a  nondeterministic  program,  in  which  multiple  partial  solutions  may  be  generated,  transmitted 
and  validated  by  different  processes.  A  merge  algorithm  that  synchronizes  the  execution  of  all  the 
cooperating  processes  is  the  crucial  part  of  our  Sync  Model. 

The  merge  algorithm  in  an  AND  process  is  basically  the  same  as  the  one  in  an  OR  process. 
The  only  difference  is  that  the  input  stream  of  the  latter  one  may  contain  True/False  values  instead 
of  variable  bindings.  In  the  following,  the  merge  algorithm  refers  to  the  one  in  an  AND  process. 

The  merge  algorithm  operates  only  when  there  exist  two  or  more  input  variables  in  a  process. 
An  input  stream  consists  of  SYNC  signals,  variable  bindings,  and  an  END  signal  at  the  end.  A 
variable  binding  is  a  pair  consisting  of  a  variable  name  and  its  binding  value.  The  SYNC  signal 
carries  the  process  identifier  that  identifies  the  generator  of  the  Sync  signal.  SYNC  signals  are 
nested  when  the  receiving  node  belongs  to  two  or  more  different  multiple  paths.  In  essence,  the 
merge  algorithm  forms  a  Cartesian  Product  over  the  input  streams  to  form  all  the  possible  input 
combinations.  When  SYNC  signals  appear,  the  algorithm  forms  Cartesian  Product  over  part  of  the 
input  streams  separated  by  pairs  of  identical  SYNC  signals.  In  other  words,  only  the  input  elements 
in  between  the  corresponding  pair  of  SYNC  signals  can  be  combined  and  the  input  streams  are 
thus  synchronized  by  the  SYNC  signals. 

In  the  rest  of  this  section,  the  base-case  algorithm  (i.e.,  no  input  stream  contains  SYNC  signals) 
is  described  in  the  next  subsection.  The  Cartesian  Product  implemented  as  nested  loops  is  inefficient 
because  the  process  may  keep  waiting  for  the  inputs  from  a  slow  channel.  A  more  efficient  algorithm 
is  given  in  Figure  5.  This  algorithm  reduces  the  waiting  time  by  forming  the  Cartesian  Product  over 
the  available  portions  of  input  streams  while  the  rest  of  the  inputs  are  not  there  yet.  The  general 
algorithm  with  input  streams  containing  SYNC  signals  is  presented  in  Section  4.2.  Figure  6  is  the 
general  algorithm  for  two  streams.  The  general  algorithm  is  a  recursive  algorithm  which  recursively 
peels  off  SYNC  signals  in  two  streams  and  finally  forms  the  Cartesian  Product  over  the  data  inputs 


17 


enclosed  by  the  innermost  SYNC  signal  pair  with  the  base-case  algorithm.  The  algorithm  for  n 
streams  can  be  derived  by  generalizing  the  two-stream  algorithm.  In  the  last  section,  a  correctness 
proof  for  the  n-input  general  merge  algorithm  is  presented. 

Throughout  the  algorithms,  buf[i,j]  is  used  to  represent  the  j-th  input  in  the  i-th  input  buffer, 
where  1  <  i  <  n  and  n  is  the  total  number  of  input  buffers.  Each  buffer  is  assumed  to  have 
enough  capacity  to  store  the  whole  input  stream.  The  index[i\  points  to  the  position  which  is 
currently  being  merged  and  avail[i]  points  to  the  top  of  the  available  portion  of  buffer  t.  Procedure 
put{entry)  adds  a  new  element  entry  into  the  output  queue,  where  entry  can  be  a  SYNC  signal, 
an  array  of  n  input  bindings  or  an  “END”  signal. 

4.1  Base-case  Algorithm 

Since  the  merge  algorithm  is  operating  concurrently  with  the  receiving  of  inputs  in  each  input 
buffer,  the  simple  iterative  loop  implementation  may  be  inefficient  due  to  waiting  for  the  inputs 
from  a  slow  channel.  A  more  efficient  implementation  is  shown  in  Figure  5. 

This  algorithm  forms  the  CP  (abbreviation  for  Cartesian  Product)  over  the  available  portions 
of  the  n  input  streams  repeatedly.  Whenever  an  input  buffer  receives  new  inputs,  Procedure  cp  is 
called  repeatedly  to  form  the  CP  over  the  newly  received  inputs  and  the  available  portions  of  the 
other  input  buffers.  Then  avail[j]  is  advanced  to  the  location  of  the  newest  available  input.  The 
algorithm  repeats  the  above  operations  for  each  input  buffer  until  the  new  input  in  all  the  input 
buffers  is  “END” . 

{  Global  Variables } 

integer  n;  {  number  of  input  buffers} 

integer  array  index [l:n] ,  avail [l:n];  {pointers} 

input  buffer  buf  [l:n,l:m]  ;  {  n  input  buffers  with  length  m  which  are  large  enough  to  contain 

the  whole  input  streams} 
buffer  entry  [l:n]  ;  {  a  buffer  to  contain  the  next  output} 

{  Cartesian  Product  of  the  available  portions  of  the  n  input  buffers  except  the  i-th  buffer 
which  is  fixed  to  an  element  e} 
procedure  cp(e,i); 
begin 

entry [i] :=e; 
cpl(i.l) 

end. 

{  Cartesian  Product  over  the  available  portions  of  buf[kj  to  buffnj  except  buffi]} 

procedure  cpl(i,k); 
begin 
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[  k>n  — *  put  (entry) 

I  k=i  — »  cpl(i,k+l) 

I  otherwise  — ►  1 : =1 ; 

*[  l<avail[k]  — ►  entry  [k]  :  =buf  [k,  1]  ; 

cpl(i,k+l) ; 

1:=1+1 


end. 

{  Main  Program  } 
begin 

i:=l; 

*[  i<n  — >  index [i] : =1 ;  avail [i]:=0;  i:=i+l  ]; 
i :  —1 ; 

*[  3k:  l<k<n:  buf  [k ,  index  [k]  ]  ^  '  END  ’  — > 

*[  i<n  — +■  *[  ->empty(buf  [i,  index [i]])Abuf[i,  index [i]] 'END'  — > 

cp (buf [i , index [i] ] , i) ; 
index[i] :=index[i]+l 

]; 

avail [i] :=index[i]-l; 
i:=i+l 

] 

] 

end. 

Figure  5.  Base-case  Algorithm 

4.2  General  Algorithm 

If  SYNC  signals  appear  in  at  least  one  input  stream,  the  general  merge  algorithm  applies. 
We  first  present  the  general  algorithm  for  two  input  streams  and  later  show  how  to  generalize  the 
algorithm  to  n  input  streams. 

In  the  ordering  algorithm,  the  literals  have  been  renumbered  so  that  their  numerical  order  is 
compatible  with  their  partial  order  in  the  data  flow  graph.  The  linear  ordering  of  the  Sync  signals 
in  an  input  stream  is  always  assured  by  the  merge  operation  which  performs  an  n-way  merge  on  n 
input  streams. 

The  general  algorithm  consists  of  two  principal  operations:  merge  on  the  same  Sync  signals 
and  merge  on  different  Sync  signals.  First,  let  two  input  streams  contain  the  same  Syncs,  say 
S,  and  the  two  input  streams  are  A  =(S,  Ai,S,A2,  . . .  ,S,An,  END)  and  B  =(S,  Bit  S,  B2, . . . ,  S, 
_B,z,END),  then  the  merge  result  is  a  sequence  of  CP’s  over  the  corresponding  portions  of  the  two 
input  sequences  which  are  separated  by  a  pair  of  consecutive  S’s,  i.e., 

Ax  B  =  (S,Ai  x  BuS,A2  x  B2,...,S,An  x  Bn, END) 


(1) 
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where  Aj  stands  for  a  sequence  of  data  inputs,  so  as  By  for  1  <  j  <  n,  and  Aj  x  Bj  stands  for  the 
CP  of  Aj  and  Bj. 

The  second  principal  operation  handles  the  merge  of  two  sequences  with  different  Syncs.  Let 
two  input  streams  be  A  -  (SI,  Au  SI,  A2, . . . ,  51,  A„,END)  and  B  =  (S2,  Bx,  52,  B2, . . . ,  S2, 
Bm,END),  and  let  S1<S2  so  that  SI  becomes  the  outer  Sync  in  the  merge  result.  The  linear 
ordering  of  the  Sync  signals  in  a  merged  stream  guarantees  that  the  common  Syncs  appearing  in 
two  input  streams  are  in  the  same  order,  therefore,  the  merge  algorithm  functions  correctly.  The 
merge  result  can  be  computed  as  follows: 

Ax  B=  (Sl,Ai  x  B,S1,A2  x  B,...,Sl,An  x  B,  END) 

=  (51, 52, Ax  x  Bi,  52, A\  x  B2,...,S2,A1  x  Bm, 

(2) 

51, 52,  A2  x  B\ ,  52, . . . ,  A2  x  Bm, 

SI,  52, . ,S2,An  x  Bm,  END) 

The  merge  result  is  actually  the  CP  of  all  the  data  inputs  of  the  two  streams  when  the  two  input 
streams  contain  different  Syncs.  In  order  to  maintain  the  synchronization  information,  we  first  do 
the  CP’s  over  the  whole  input  stream  B  and  a  portion  of  stream  A,  i.e.  A;  for  all  i  and  separate 
the  CP’s  by  SI.  In  each  A{  x  B,  again  we  do  a  set  of  CP’s  of  Ai  x  Bj  for  all  j  and  separate  them 
by  S2.  The  CP  “Aj  X  Bj”  contains  no  Sync  signals,  hence  the  base-case  algorithm  can  be  applied. 
In  the  result,  the  number  of  Sync  signals  SI  is  preserved,  i.e.,  n,  and  the  number  of  Sync  signal  S2 
is  increased  to  n  X  m  because  S2  is  nested  inside  SI. 

The  general  algorithm  for  two  input  streams  is  recursively  defined  on  the  two  principal  oper¬ 
ations.  The  Sync  sequences  of  the  input  streams  are  linearly  ordered,  i.e.,  a  Sync  signal  is  larger 
than  all  the  Syncs  which  are  outer  to  it  and  smaller  than  all  the  Syncs  inner  to  it.  In  each  re¬ 
cursion,  the  outermost  Sync  signals  of  the  two  input  streams  are  checked.  If  they  are  the  same, 
the  first  principal  operation  is  called.  If  they  are  different,  the  second  principal  operation  is  called. 
The  merge  algorithm  is  called  recursively  to  compute  each  A;  X  Bt-  in  (1)  or  each  A;  x  B  in  (2). 
When  the  merge  algorithm  is  called  to  merge  two  input  streams  without  any  Sync  signals,  the 
base-case  algorithm  is  applied  to  get  the  CP.  The  merge  result  preserves  the  linear  ordering  of 
the  Sync  sequence.  Figure  6  presents  the  major  procedures  of  the  merge  algorithm:  merge,  and 
scanto.  Procedure  merge  merges  the  input  streams  in  buf  1  and  buf 2,  and  puts  the  result  in  an 
output  queue.  Boolean  function  sync  checks  whether  the  given  argument  is  a  Sync  signal  or  not. 
Procedure  merge  has  a  guarded  command  with  four  alternatives:  (1)  neither  of  the  inputs  contains 
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Sync  signals:  cp  is  called  to  derive  the  Cartesian  Product,  (2)  either  buf  1  contains  Sync  signals  and 
buf  2  does  not,  or  both  inputs  have  Sync  signals  and  the  outermost  Sync  of  buf  1  is  smaller  than 
that  of  buf  2:  the  second  principal  operation  applies,  (3)  same  condition  as  (2)  with  buf  1  and  buf  2 
switched:  the  second  principal  operation  also  applies  with  A  and  B  switched,  and  (4)  both  inputs 
contain  Sync  signals  and  the  outermost  Syncs  of  the  two  inputs  are  the  same:  the  first  principal 
operation  applies.  Procedure  scanto  divides  the  input  buffer  into  two  parts  by  the  first  occurrence 
of  some  specific  SYNC  signal  S.  Procedure  cp  is  the  base-case  merge  algorithm  which  generates 
the  CP  of  the  data  elements  in  two  buffers. 

procedure  merge (buf l,buf2) 
begin 

[buf  l=0Vbuf  l="END"Vbuf  2=0Vbuf  2="END"  ->  skip 
I  otherwise—*  A:=bufl[l];  B:=buf2[l]; 

[-isync(A)  A-isync(B)  — ►  cp(buf  1  ,buf2)  (1) 

I  sync  (A)  A  (— isync  (B)  V  (A<B) )  — ►  scanto  (buf  1 ,  A, buf  11 , buf  12)  ;  (2) 

put (A) ; 

merge (buf 11 ,buf2)  ; 
merge (buf 12 , buf 2) 

I  sync  (B)  A  (-isync  (A)  V  (B<A)  )  — ►  scanto  (buf  2  ,B  ,buf  21  ,buf22)  ;  (3) 

put (B) ; 

merge (buf 1 . buf 12) ; 
merge (buf 1 , buf 22) 

|  sync  (A)  Async  (B)  A  (A=B)  — *■  scanto(bufl,A,bufll.buf  12)  (4) 

scanto(buf2,B,buf21,buf22) ; 
put (A) ; 

merge (buf 11, buf 12) ; 
merge (buf 12 , buf 22) 

] 

] 

end  of  procedure  merge. 

procedure  scanto (buf ,S ,  buf 1, buf 2) 
begin 

i:=2; 

*[  buf  [i]^SAbuf  [i]^"END"  -*•  buf  1  [i]  :=buf  [i]  ;  i:=i+l]: 
j:=l;  N:=length(buf) ; 

*[  i<N  — ♦  buf 2 [ j ] : =buf [i] ;  i:=i+i;  j:=j+l] 
end  of  procedure  scanto. 


Figure  6.  General  Algorithm  for  two  buffers 

If  there  are  more  than  two  input  buffers  and  some  of  them  have  one  or  more  SYNC  signals,  the 
above  algorithms  can  be  generalized  easily.  With  n  input  streams,  in  which  each  has  an  ordered 
Sync  sequence,  the  merge  algorithm  applies  recursively  to  remove  the  smallest  Sync  signal  of  the 
n  outermost  ones  of  the  input  streams  one  at  a  time.  When  the  smallest  Sync  is  common  to 
several  input  streams,  all  those  Syncs  will  be  removed  at  once.  When  none  of  the  input  streams 
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contains  Sync  signals,  the  Cartesian  Product  over  n  input  streams  is  performed.  For  instance,  if 
merge(bufl,  buf2, . . . ,  bufn)  is  called  and  a  smallest  Sync  S  is  found  in  both  bufi  and  bufj,  the 
following  program  is  executed: 

scanto(bufi,S,bufil,bufi2) ; 
scanto(buf j , S , buf  j 1 , buf  j  2 ) ; 

put (s) ; 

merge  (buf  1 . buf  i  1 . buf  j  1 . bufn)  ; 

merge  (buf  1 ,  buf  i2 . buf  j  2 , . . . ,  bufn)  ; 

The  merge  algorithm  in  an  OR  process  merges  the  partial  solutions  received  from  its  AND 
descendants  to  form  all  the  legal  solutions  of  this  OR  subtree.  The  partial  solutions  received  from 
one  AND  descendant  could  be  variable  bindings  or  true/false  values.  The  true/false  values  are  used 
to  select  the  merge  result  from  other  channels.  If  the  value  is  true,  the  merge  algorithm  merges  the 
partial  solutions  as  usual.  If  the  value  is  false,  the  merge  algorithm  skips  the  merge  operation  and 
returns  false  instead.  In  addition,  the  merge  algorithm  in  an  OR  process  eliminates  all  the  Sync 
signals  in  the  merge  result  so  that  the  solution  stream  sent  up  to  the  father  AND  process  contains 
no  Sync  signals. 

4.3  Correctness  Proof 

In  order  to  prove  that  the  merge  algorithm  produces  all  the  correct  combinations  of  multiple 
inputs  of  a  process,  we  shall  define  the  syntactic  structure  of  an  “input  stream”  and  give  a  formal 
treatment  of  how  an  AND  process  transforms  one  or  more  input  streams  into  an  output  stream. 

Definition :  An  input  stream  Sjtj(D)  can  be  defined  recursively: 

1.  E0(D)  =  D 

2-  S Ru {i}(D)  ~  ^(^{^(-P))  =  ^R{(Si,Dv)n^),  Vj  €  R:i  >  j. 

where  R  is  an  ordered  set  of  integers.  Each  element  in  R  is  a  Sync  that  appears  in  the  input  stream. 
Let’s  call  R  the  Sync  sequence  of  this  input  stream.  We  slightly  abuse  notations  and  represent  R  by 
the  array  s.t.,  i  <  j  =>-  J2[i]  <  R[j].  E ^  is  an  operator  defined  recursively  over  the  input  data, 
D,  where  D  is  the  input  stream  with  all  the  Syncs  removed.  Applying  over  D  is  to  divide 
D  into  ni  groups  and  separate  each  group  by  a  Sync  S,-.  Each  group  of  input  data,  Dv,  is  called 
a  data  segment,  which  is  uniquely  identified  by  a  vector,  v.  In  (2),  v  is  a  vector  of  length  (r  +  1) 
where  |i?|  =  r  and  t>[r  +  1]  =  k  for  1  <  k  <  nt-.  Therefore,  Dv  represents  a  data  segment  that  is 
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produced  by  the  k-th  output  of  the  Sync  generator  S{.  Besides,  (iS^,  Dy)11*  is  a  regular  expression 
denoting  the  concatenation  of  the  string  ( Si,Dv )  nt-  times.  Notice  that  the  data  segment  Dv  is 
changed  every  time  the  syntactic  structure  of  the  input  stream  is  transformed.  The  above  notation 
is  used  to  represent  the  syntactic  structure  of  an  input  stream.  How  Dv  is  changed  by  different 
transformations  of  the  input  stream  will  be  explained  later. 

There  are  two  ways  of  changing  the  structure  of  an  input  stream  in  our  model.  First,  if  an 
AND  process  is  a  Sync  generator,  the  structure  of  the  output  stream  is  derived  by  concatenating 
an  extra  Sync  signal  to  the  Sync  sequence  of  the  input  stream.  Second,  if  an  AND  process  has 
several  inputs,  say  n,  the  structure  of  the  merge  output  can  be  derived  by  an  n-way  merge  of  the 
n  Sync  sequences.  Figure  7  shows  the  two  possible  transformations  of  an  AND  process  given  one 
input  and  one  output.  In  Figure  7.a,  the  structures  of  the  input  and  the  output  streams  are  the 
same  because  the  AND  process  is  not  a  Sync  generator.  In  Figure  7.b,  the  AND  process  is  a  Sync 
generator  which  generates  Sync  S{  and  the  output  stream  has  the  structure  Because  of  the 

total  ordering  of  the  Sync  generators,  i  is  guaranteed  to  be  larger  than  any  element  in  R.  Figure 
8  shows  the  input-output  transformation  of  the  merge  algorithm,  given  n  input  streams.  The  Sync 
sequence  of  the  output  is  derived  by  n-way  merge  of  the  n  input  Sync  sequences.  An  AND  process 
with  n  inputs  and  one  output  can  be  represented  by  one  merge  operation  (Figure  8)  followed  by  one 
of  the  two  AND  operations  (Figure  7)  depending  on  whether  the  AND  process  is  a  Sync  generator 
or  not. 


Figure  7.  The  transformation  of  an  AND  process  with  single  input 
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Figure  8.  The  transformation  of  the  merge  algorithm  with  two  inputs 

The  data  segments  in  the  input  stream  are  changed  differently  in  the  two  transformations 
described  above.  Since  we  are  only  interested  in  the  merge  result,  we’ll  only  consider  the  second 
case,  i.e.,  the  transformation  due  to  the  merging  of  two  input  streams. 

Definition:  An  ordered  union  operator  “U”  is  defined  as  R  =  Ri  U  R2,  where  R,  R1  and  R2  are 
ordered  sets  (i.e.,  the  elements  in  the  set  are  sorted  in  ascending  order)  and  R  =  R1  u  j?2.  In  other 
words,  it  is  equivalent  to  a  two-way  merge. 


Definition:  An  ordered  join  operator  “  L)  ”  is  defined  as  vR  =  vRl  LJ  vR2,  where  R=  RiUR2,  vR, 
vRl  and  vR2  are  vectors  with  length  \R\,  |#i|  and  \R2\  respectively.  vR  is  the  result  of  joining  vRl 
and  on  the  common  elements  of  Ri  and  i?2.  More  precisely,  vR  =  vR)  U  vr2  iff 


1-  Vi,i  :  Ri [t]  =  R2[j]  =>-  vRl  [*]  =  vR2[j\  and 
2  v  r,-]  _  /  VR,  [i|>  ^ 

^  \  vR2  [*],  if  #[*]  =  i?2  k}- 


Theorem  1.  Given  two  input  streams  E Ra(D°)  and  E Rj3{Db),  the  result  generated  by  the  merge 
algorithm  is  'ERcDc,  where  Rc  =  RA  u  RB.  Moreover,  De  is  defined  as  the  Cartesian  Product  of 
Da  and  Db  such  that 

Dvc  ~  DVa  X  DVf>  with  vc-vaU  Vf,  (3) 

Proof :  Let  the  length  of  RA  and  Rr  be  ta  and  tj  respectively.  This  theorem  can  be  proved  by 

induction  on  the  ordered  pair  (ta,tb),  where  (ta,tb)  <  {t'a,t'b)  iff  ta  <  t'a,  or  ta  =  t'a  and  tb  <  t'b. 
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It  is  easy  to  derive  the  proof  from  the  program  in  Figure  6.  The  complete  proof  is  given  in 

[6]-  ■ 

From  Theorem  I,  we  derive  the  merge  result  with  two  arbitrary  input  streams.  The  remaining 
task  is  to  show  that  the  merge  result  is  correct.  Given  a  process  with  two  inputs,  a  legal  input 
combination  is  an  input  pair  such  that  the  input  elements  of  the  pair  are  originated  from  the  same 
output  of  a  common  ancestor  along  the  two  input  paths.  An  input  path  is  a  path  containing  the 
current  process,  one  of  the  two  input  links,  and  tracing  back  to  any  ancestor  of  the  current  process. 
There  are  many  such  paths.  If  a  process  is  shared  by  any  two  input  paths,  in  which  each  contains 
one  different  input  link,  then  only  the  inputs  which  are  derived  by  the  same  output  of  that  process 
can  be  combined.  Notice  that  such  a  common  ancestor  is  marked  as  a  Sync  generator.  Therefore,  by 
observing  the  Sync  sequences  of  the  two  input  streams,  we  can  determine  all  the  common  ancestors 
which  affect  the  merge  result  along  the  two  input  paths. 

Theorem  2  shows  that  the  merge  result  in  Theorem  1  indeed  contains  all  the  legal  input 
combinations. 


Theorem  2.  The  result  of  the  merge  algorithm  contains  all  the  legal  input  combinations. 

Proof :  Supposed  that  the  two  input  streams  in  Theorem  1  have  n  common  Sync  signals,  i.e., 

I RA  n  rb\  —  ni  we  need  to  prove  that  all  the  inputs  that  are  derived  from  the  same  outputs 
generated  by  the  n  Sync  generators  are  combined.  Let  P^  be  the  Sync  generator  that  generates  a 
Sync  signal  S';.  Then,  each  output  generated  by  P,-  is  separated  by  a  pair  of  S^s.  By  propagating 
the  output  stream  of  Pj  throughout  the  data  flow  graph,  the  syntactic  structure  of  the  output 
stream  may  or  may  not  be  changed.  If  the  syntactic  structure  of  the  output  stream  is  not  changed, 
any  result  derived  by  the  k-th  output  of  P \  is  appeared  in  the  same  data  segment  enclosed  by  the 
corresponding  pair  of  S^'s.  When  the  syntactic  structure  of  the  output  stream  is  changed  by  merge 
operations  or  the  generation  of  new  Syncs,  S may  be  further  nested  into  other  Sync  signals.  In 
this  case,  the  results  derived  by  the  k-th  output  of  P,-  are  divided  into  several  data  segments  and 
spread  into  different  locations.  Generally  speaking,  with  the  input  stream  A  in  Theorem  1,  the 
inputs  that  are  derived  by  the  k-th  output  of  process  P{  are  the  union  of  all  the  data  segments 
with  the  j-th  element  of  its  id  vector  being  k,  where  j  is  the  position  that  S ’t-  is  placed  in  the  Sync 
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sequence  Rj^. 

U  < 

Vua: 

«aU]=fcAiJ_4  [y]=t 

where  Uvua:«a[j]=AAjR^[y]=»  's  used  as  an  abbreviation  for  a  sequence  of  unions  with  the  index  va 
satisfing  the  condition  specified  in  the  subscript  of  |J. 


Assume  there  are  n  common  Syncs,  Si± ,Si2,... , Sin ,  in  the  two  input  streams.  We  will  show 
that  the  merge  result  in  the  case  the  Sync  generator  P^.  generating  the  iy-th  output,  for  all  j, 
1  <  j  <  n,  is  the  Cartesian  Product  of  the  portions  of  the  two  input  streams  under  the  same 
condition.  Let  kj,  lj  and  my  be  the  locations  where  Sy,  appears  in  Ra,  Rb  and  RC)  i.e.,  J?^[jty]  = 
Rb [lj]  —  Rc\mj\  =  »y>  f°r  1  fs  3  ^  n-  Then  the  above  relation  can  be  formulated  as  follows: 


U  K  =  U  x  u  d\.  (5) 

Vt )c'  Vva:  V vu: 

(Vj':l<j<n:  (Vj:l</<n:  (Vj:l  <}<n-. 

«c[my]=ty)  «a|*yl=ty)  «,6[(y]=tj.) 

Eq.  (4)  can  be  derived  from  Eq.  (3)  easily.  First  add  a  big  union  U 'ivc-.{'iy.l<j<n-.vc[m:j]=t:j)  to  both 
sides  of  (3).  Then  divide  the  unions  at  the  right  hand  side  into  two  independent  sets  of  unions  and 
then  move  the  unions  inside  the  CP  and  associate  the  first  set  of  unions  to  Da  and  the  second  set 
of  unions  to  Db. 

U  K  =  U  W. x  <) 

V«o:  Vtic: 

(Vj:l<j<n:  (Vj:l<j<n: 

«o[my]=ty)  «c[my]=ty) 

=  U  W.  X  <) 

V(va  U  vb): 

(Vj:l<j<n: 

”a[k/]=vb[l]-]=tJ) 

=  U  U  «  x  <) 

Vt/^: 

(Vj:l<j<n:  (Vj:l<j<n: 

®a[feyl=ty)  „6[Iy]=ty) 

=  U  K  x  u  <. 

Vva:  V«j,: 

(Vj:l<j<n:  (Vj:l<j<n: 

•«[*/]=*,-)  -fc[«y]=*y) 

Therefore,  we  can  conclude  the  merge  algorithm  gives  all  the  legal  input  combinations.  | 


With  the  above  theorems,  we  can  show  that  the  Sync  Model  is  complete,  i.e.,  the  Sync  Model 
generates  all  the  solutions  for  a  given  program. 


From  Kowalski  [4],  we  know  that  each  successful  computation  of  an  initial  goal  can  be  rep¬ 
resented  as  a  subtree  of  the  AND/OR  tree,  i.e.,  the  process  tree  in  our  Model.  Such  a  subtree 
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starts  from  the  root,  expands  by  including  exactly  one  descendant  OR  process  for  each  of  its  AND 
process  and  all  the  descendant  AND  processes  for  each  of  its  OR  process,  and  ends  with  leaf  nodes 
that  successfully  terminates. 

Since  any  successful  computation  can  be  mapped  onto  a  subtree  in  the  Sync  Model,  if  we  can 
prove  that  such  subtree  generates  the  same  solution  as  this  successful  computation,  then  the  Sync 
Model  is  proved  to  be  complete. 

Theorem  3.  The  Sync  Model  is  complete. 

Proof:  We  first  prove  that  a  subtree  that  represents  a  successful  computation  generates  the 

same  solution  as  this  computation.  Let’s  first  choose  any  OR  process  in  a  subtree  that  corre¬ 
sponds  to  a  successful  computation.  Assume  this  OR  process  contains  a  goal  g  and  a  clause 
“g  1  Pl,P2,  •  ■  •  ,Pn”  ■  Let  Xi,X2, . . . ,  Xm  be  the  variables  within  this  clause  and  the  successful 
computation  gives  a  unique  solution  to  these  variables,  i.e.,  . . .  ,tm.  Moreover,  let  each  p.- 

contains  a  set  of  input  variables  and  a  set  of  output  variables.  The  input-variable  set  and  the 
output- variable  set  of  any  p,-  are  disjoint  and  both  of  them  are  subsets  of  {X\, . . .  ,Xm). 

Let’s  assume  that  the  subtree  under  each  pt-  produces  the  correct  solutions  for  the  output 
variables  of  pt-  if  the  input  variables  are  bound  to  the  correct  values.  Here,  the  correct  solution  of  a 
variable  X ,•  is  meant  to  be  t,-.  Therefore,  any  process  pt-  that  has  no  input  variables  will  generate  the 
correct  solutions  to  its  output  variables.  Furthermore,  any  pt-  with  nonempty  input- variable  set  will 
produce  the  correct  solutions  to  its  output  variables  if  the  producers  of  its  input  variables  generate 
the  correct  solutions.  The  above  statement  is  obviously  true  if  pj  has  only  one  input  variable.  It  is 
also  true  if  pt-  has  more  than  one  input  variable  because  the  merge  algorithm  in  pt-  always  generates 
the  correct  input  combinations  from  Theorem  2.  Therefore,  the  OR  process  generates  the  correct 
solution  for  its  goal  g  assuming  the  subtrees  under  each  pt-  are  correct.  Furthermore,  if  in  the 
subtree  corresponding  to  a  successful  computation,  there  is  an  OR  process  which  contains  a  unit 
clause.  This  OR  process  is  always  a  leaf  node  and  it  generates  the  correct  solutions  to  the  output 
variables  of  the  goal  in  the  process.  Thus,  by  induction,  the  subtree  corresponding  to  a  successful 
computation  will  generate  the  correct  solution  for  that  computation. 

From  the  other  direction,  we  shall  also  prove  that  any  minimal  subtree  which  produces  an 
answer  corresponds  to  a  successful  computation.  A  minimal  subtree  is  a  subtree  which  contains  no 
failure  nodes.  The  proof  is  similar  to  the  proof  above  and  thus  omitted  here. 
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Since  any  successful  computation  can  be  mapped  onto  a  subtree  in  the  Sync  Model  and  each 
subtree  generates  the  correct  solution  for  the  corresponding  computation,  we  conclude  that  the 
Sync  Model  generates  all  the  solutions  for  a  given  program  and  therefore  it  is  complete.  | 

5  Conclusion 

We  have  presented  a  model  for  the  parallel  execution  of  logic  programming  on  a  message¬ 
passing  multiprocessor  system.  AND  parallelism  is  carried  out  by  constructing  an  efficient  data 
flow  graph  dynamically  The  mechanism  that  is  used  to  synchronize  the  multiple  partial  solution 
flows  in  the  data  flow  graph  makes  it  possible  to  realize  both  AND  parallelism  and  OR  parallelism 
without  any  form  of  backtracking. 

Our  model  is  complete.  It  handles  both  deterministic  and  non-deterministic  programs,  and  it 
is  particularly  good  for  non-deterministic  programs  with  multiple  solutions.  It  is  able  to  handle  a 
pure  logic  program  as  well  as  an  extended  logic  program  with  variable  annotations  and  guarded 
clauses. 

In  our  model,  the  AND/OR  tree  is  searched  in  both  breadth-first  and  depth-first  manner 
Consider  two  sibling  AND  processes  that  share  a  common  variable.  The  subtree  under  the  producer 
of  the  variable  will  be  searched  first  and  then  the  search  for  the  consumer  and  its  subtree  can  be 
started.  If  the  producer  produces  multiple  solutions  to  the  variable,  the  execution  of  the  two  sibling 
AND  processes  are  pipelined.  Although  this  approach  seems  to  be  less  parallel  than  purely  breadth- 
first  search  of  the  AND/OR  tree,  our  model  is  in  fact  more  efficient  because  we  avoid  unnecessary 
computations  in  the  consumer  process.  In  a  purely  breadth-first  search,  invalid  bindings  of  the 
shared  variable  are  sent  to  the  consumer  and  later  found  invalid  by  a  process  in  the  subtree  of  the 
producer. 

We  believe  that  any  form  of  backtracking  -  “naive”  or  “intelligent”  -  should  be  totally  elimi¬ 
nated  from  an  OR-parallel  model  of  logic  programming.  Backtracking  simply  means  complicated 
control  and  high  overhead.  The  synchronization  mechanism  proposed  in  the  Sync  Model  is  clean 
and  simple.  Although  we  need  extra  Synchronization  signals,  we  don’t  need  to  send  the  complete 
set  of  bindings  and  thus,  the  overhead  is  actually  lower. 

Our  Model  can  be  modified  to  handle  stream  parallelism  as  well.  Extended  with  tail  recursion 
optimization  [6],  our  model  becomes  an  efficient  parallel  model  that  exploits  all  kinds  of  parallelism 
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inherent  in  a  logic  program.  The  mapping  from  the  Sync  Model  onto  the  Sneptree,  which  is  chosen 
as  the  target  machine  for  our  Model,  is  found  to  have  minimal  mapping  cost  in  terms  of  load 
balancing  and  communication  overhead.  Therefore,  it  is  feasible  to  construct  a  message-passing 
multiprocess  system  based  on  the  Sneptree  architecture  to  implement  the  Sync  Model  effectively. 
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